Novel Definition and Algorithm for Chaining Fragments with Proportional Overlaps

نویسندگان

  • Raluca Uricaru
  • Alban Mancheron
  • Eric Rivals
چکیده

Chaining fragments is a crucial step in genome alignment. Existing chaining algorithms compute a maximum weighted chain with no overlaps allowed between adjacent fragments. In practice, using local alignments as fragments, instead of Maximal Exact Matches (MEMs), generates frequent overlaps between fragments, due to combinatorial reasons and biological factors, i.e., variable tandem repeat structures that differ in number of copies between genomic sequences. In this article, in order to raise this limitation, we formulate a novel definition of a chain, allowing overlaps proportional to the fragments lengths, and exhibit an efficient algorithm for computing such a maximum weighted chain. We tested our algorithm on a dataset composed of 694 genome pairs and accounted for significant improvements in terms of coverage, while keeping the running times below reasonable limits. Moreover, experiments with different ratios of allowed overlaps showed the robustness of the chains with respect to these ratios. Our algorithm is implemented in a tool called OverlapChainer (OC), which is available upon request to the authors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Portfolio Allocation based on two Novel Risk Measures and Genetic Algorithm

The problem of optimal portfolio selection has attracted a great attention in the finance and optimization field. The future stock price should be predicted in an acceptable precision, and a suitable model and criterion for risk and the expected return of the stock portfolio should be proposed in order to solve the optimization problem. In this paper, two new criterions for the risk of stock pr...

متن کامل

Match Chaining Algorithms for cDNA Mapping

We propose a new algorithm called the MCCM (Match Chaining-based cDNA Mapping) algorithm that allows mapping cDNAs to the genomes efficiently and accurately, utilizing local matches called MUMs (maximal unique matches) or MRMs (maximal rare matches) obtained with suffix trees. From the MUMs (or MRMs), our algorithm selects appropriate matches which are related to the cDNA mapping. We call the s...

متن کامل

Optimal intelligent control for glucose regulation

This paper introduces a novel control methodology based on fuzzy controller for a glucose-insulin regulatory system of type I diabetes patient. First, in order to incorporate knowledge about patient treatment, a fuzzy logic controller is employed for regulating the gains of the basis Proportional-Integral (PI) as a self-tuning controller. Then, to overcome the key drawback of fuzzy logic contro...

متن کامل

Chaining Multiple - Alignment Fragments in Sub - Quadratic

We describe a multiple-sequence alignment algorithm for determining the highest-scoring alignment that can be obtained by chaining together non-overlapping subalignments selected from a given collection of such \fragments". For a given set of K sequences, a problem instance consists of a set of F precomputed fragments, an alignment score for each fragment, and a \gap" penalty function that assi...

متن کامل

A multi-stage stochastic programming for condition-based maintenance with proportional hazards model

Condition-Based Maintenance (CBM) optimization using Proportional Hazards Model (PHM) is a kind of maintenance optimization problem in which inspections of a system relevant to its failure rate depending on the age and value of covariates are performed in time intervals. The general approach for constructing a CBM based on PHM for a system is to minimize a long run average cost per unit of time...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 18 9  شماره 

صفحات  -

تاریخ انتشار 2010